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Initial non-repetitive complexity of infinite words 
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Abstract 

The initial non-repetitive complexity funetion of an infinite word x (first intro¬ 
duced by Moothathu) is the function of n that counts the number of distinct 
factors of length n that appear at the beginning of x prior to the first repetition 
of a length-n factor. We examine general properties of the initial non-repetitive 
complexity function, as well as obtain formulas for the initial non-repetitive 
complexity of the Thue-Morse word, the Fibonacci word and the Tribonacci 
word. 
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1. Introduction 

For any infinite word x, there is an associated complexity funetion Cx defined 
as follows: the quantity Cx(n) is the number of distinct factors of length n that 
appear in the word x. Properties of the complexity function for various classes 
of infinite words have been extensively studied Q, Chapter 4]. Several variants of 
the complexity function have been introduced and studied, such as palindrome 
complexity [l| or abelian complexity [l^ . In this paper we study the initial 
non-repetitive complexity function, which was first introduced by Moothathu 
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Ini- 


We define the initial non-repetitive complexity function inrCx(n) for an infi¬ 
nite word X by 

inrCx(n) = max{TO G N : ^ Xj ■ ■ ■ Xj+n-i for every i, j with 0 < i < j < m—1}. 

In other words inrCx(n) is the maximum number of length-n factors that we 
see when reading x from left to right prior to the first repeated occurrence of a 
length-n factor. 

Moothathu posed the following question in his paper: Is it possible to get 
some idea about the topological entropy of a dynamical system by looking only 
at initial segments of the orbit of some point? As an attempt to answer this 
question, he introduced the quantity 

loginrcx(n) 
lim sup-, 

n—>oo 

which he called “non-repetitive complexity”. In this paper, we will use the term 
non-repetitive complexity for the following function: 

nrCx(n) = max{m G N : 3fc, Xi - • ■ Xi+n-i Xj ■ ■ ■ Xj+n-i for every i, j with k < i < j < 

This paper is primarily about the function inrCx(n). Although Moothathu 
introduced the concept, he did not explicitly compute this function for any 
particular infinite words. In a future work, it would be of interest to study the 
function nrCx('u), which likely has many similar properties. 

The initial non-repetitive complexity also bears some resemblance to the 
quantity R'^{n), which is the length of the shortest prefix of x that contains at 
least one occurrence of every length-n factor of x . There is also a connection 
(which we shall make use of later) to the concept of a word with grouped factors, 
which was studied by Cassaigne Q . 

In the remainder of this paper, we will give some general properties of the 
initial non-repetitive complexity function in comparison to the usual complex¬ 
ity function. We will also give explicit formulas for the initial non-repetitive 
complexity of some of the classical infinite words, namely, the Thue-Morse word 
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(m), the Fibonacci word (f) and the Tribonacci word (t). Finally we examine 
the possible range of values that the initial non-repetitive complexity function 
can take for squarefree words. We attempt to construct squarefree words with 
slowly growing initial non-repetitive complexity functions. This is somewhat 
similar to the notion of a highly repetitive word [l^ . 

2. Preliminaries 

Let S denote a finite alphabet and let E* denote the set of finite words over 
E. Let {0,1} be the alphabet in the case of the Thue-Morse and Fibonacci 
words and let the alphabet be {0,1, 2} for the Tribonacci word. If 0 : E* E* 
is a morphism, then 0'’(u) for a non-negative integer r and a word u is obtained 
by applying the morphism 0 to u r times (we define 9^{u) = u). By convention, 
we denote the string of length 0 by e. A word y is a factor of a word a; if a: can 
be written as a: = uyv for some words u and v. If a; is a word (finite or infinite) 
we let x[i ■ ■ -j] denote the factor of x of length j — i + 1 that starts at position 
i in X. We denote the length of any finite word u by |m|. For any letter a, we 
denote the number of occurrences of a in u by \u\a- 

A word X = Xi - ■ -Xn has period p it Xi = Xi+p for i = 1,..., n — p. An 
infinite word w is ultimately periodic if w = uvvvv ■ ■ ■ for some words u and v. 
It u = e then w is periodic. If w is not ultimately periodic then it is aperiodic. 
If every factor of w occurs infinitely often in w then w is recurrent. 

If w = a0(s)0^(s)0^(s), where 0 : E* —>■ E* is a morphism, a G E, s G E*, 
and 0(a) = as, then w is pure morphic. The adjacency matrix associated with 
a morphism 0 is the matrix M with rows and columns indexed by elements of 
E such that the ij entry of M equals |0(j)|i. 

A square is a non-empty word of the form xx, and a cube is a non-empty 
word of the form xxx. More generally, if u is a word with period p, then we say 
that u is an a-power, where a = |u|/p. An overlap is a word of the form axaxa, 
where a is a letter and x is a word (possibly empty). A word is squarefree (resp. 
cubefree, overlap-free) if none of its factors are squares (resp. cubes, overlaps). 
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For any real number a, we say that an infinite word is a-powerfree if for all 
(3 > a, none of its factors are /3-powers. A palindrome is a word that equals its 
reversal. 

Let /X be the Thue-Morse morphism defined by /i(0) = 01, /i(l) = 10. 
Clearly |/x(m)| = 2|u| for any factor u of m. We define the Thue-Morse word as 
m = /i“(0). If u = xiX 2 • ■ • Xs is a word over {0,1} for some positive integer s, 
then we define uhy u = yiy 2 ■ ■ - ys where yi = 1 — Xi. 

Let a; be a finite or infinite word. A factor x; of x is left special (resp. right 
special) if there are distinct letters a and b such that va and vb (resp. av and 
bv) are factors of x. A factor -u of x is bispecial if it is both left special and 
right special. An infinite word is Sturmian if it contains exactly n 1 factors 
of length n for every n > 0. A Sturmian word is standard if each of its prefixes 
is left special. 

Let (/) be the Fibonacci morphism defined by (/)(0) = 01, ^(1) = 0. We 
define the Fibonacci word as f = (/)“(0). We define fk = We define the 

Fibonacci sequence as Fq = 1, Fi = 2 and Fk = Fk-i Fk -2 for k >2. Note 
that \fk\ = Fk and that fk = fk-ifk -2 (that is, fk is the concatenation of fk-i 
with fk- 2 )- Also note that the Fibonacci word is a standard Sturmian word. 

Let a be the Tribonacci morphism defined by (t(0) = 01, a{l) = 02, a{2) = 0. 
We define the Tribonacci word as t = cr“(0). We define tk = cr^(O). We define 
the Tribonacci sequence as Tq = 1, Ti = 2, T 2 = 4 and Tk = Tk-i -\-Tk -2 +7fc_3 
for fc > 3. Also, we define f_i = 2 and T_i = 1. Note that \tk\ = Tk and that 
tk = tk-itk- 2 tk- 3 - We define Dk = tk-itk -2 ■ • • t 2 tito for k > 1. By convention, 
we define Dq = e. 

3. Some general properties of initial non-repetitive complexity 

Recall that the complexity function c-^(n) satisfies c-^(n) > n for any aperi¬ 
odic word w. This is not necessarily true for the initial non-repetitive complexity 
function. Nevertheless, the initial non-repetitive complexity must grow at least 
linearly for any aperiodic word w. Note also that the initial non-repetitive 
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complexity is non-decreasing. 


Theorem 1. Let w be an infinite word and let ip be the golden ratio. The 
following are equivalent. 

1. w is ultimately periodic. 

2. inrCw(n) is bounded. 

n—foo ^ 

inrcw(n) 1 

4. limsup- < -- 5 -. 

n^oo n lA- ip‘^ 

Proof. The implications 1 => 2 3 => 4 are straightforward. We prove 4 => 1. 

Let e < 1/{1 + ip^) and suppose that there exists N such that inrCw(n) < en 
for all n > N. Suppose further that N satisfies [(1 -f ip‘^)e{N + 1)] < N. 
For each n > N, there exist integers and satisfying 0 < *„ < < en 

such that w[i„ ... -|- n — 1] = w[j„ ... -|- n — 1]. Define = jn — in and 

note that w[i„ ... -|- n — 1] has period < en. Define discrete intervals 

In = [in + [if'^Pn'] ,in + n — 1]. For every i £ In, the prefix w[0... z — 1] ends 
with a -power. Moreover, since 


z„+i-|- \(p'^pn+i] < e(n-|-l)-|- \ip’^e{n+V)\ < [(1-|-(p^)e(n-|-1)] 4-1 < n < in + n, 


we have U„>Ar/„ = [iN+\T^PN'], oo]- Thus, for every i > zy-l-, the prefix 
w[0 ... z—1] ends with a (/j^-power. Mignosi, Restivo, and Salemi 16|, Theorem 2] 
showed that this implies that w is ultimately periodic, as required. □ 


This result gives an interesting new characterization of ultimate periodicity. 
Later (Theorems [6] and [T0)l we shall compute the initial non-repetitive com¬ 
plexity function for the Thue-Morse word m and the Fibonacci word f. These 
results imply 

inrcm(n) 

lim sup-= 6 

n—>oo 

and 

n—¥oo ^ 

One may therefore reasonably wonder if the constant \/l\ + ip"^) is optimal in 
Theorem [U or if it could perhaps be replaced by 1. 


5 








Next we show that there are infinite words whose initial non-repetitive com¬ 
plexity is maximal. First, recall that for any alphabet of size q and any n there 
exists a (non-cyclic) q-ary de Bruijn sequence of order n, that is, a word of 
length q'^ + n — 1 that contains every g-ary word of length n as a factor (see 
15l|'). A cyclic q-ary de Bruijn sequence of order n is a word Bn of length g" 
that contains every g-ary word of length n as a circular factor. Here by circular 
factor we mean a factor of some cyclic shift of Bn- 

Proposition 2. 

(a) Over any alphabet of size g > 3 there exists an infinite word w satisfying 


inrCw(n) = g" 


for all n > 1. 

(b) Over a binary alphabet there exists an infinite word w satisfying 

inrCw(2n) = 2^” 


for all n > 1. 

Proof. This is a consequence of a result of Ivanyi (part (a) only) [l^ or Becher 
and Heiber Q| (both parts). They showed that over alphabets of size at least 
three 3 any (non-cyclic) de Bruijn sequence of order n can be extended to a 
de Bruijn sequence of order n -I- 1. Taking the limit of such extensions gives 
an infinite word with the desired property. Curiously, over a binary alphabet, 
de Bruijn sequences of order n cannot be extended to order n -|- 1, but can be 
extended to give de Bruijn sequences of order n -|- 2. □ 

Next we explore the relationship (if any) between the factor complexity and 
initial non-repetitive complexity functions. The next result shows that there 
are infinite words with maximal factor complexity but only linear initial non- 
repetitive complexity. 
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Proposition 3. Let q > 1 and let Bn denote a cyclic q-ary de Bruijn sequence 
of order n starting with n 0’s. Then 

is an infinite word with complexity g" and initial non-repetitive complexity < An 
for n > 1. 

Proof. Since contains every q-ary word of length n as a circular factor, 
having at least n — 1 O’s follow each Bn ensures that every q-ary word of length 
n shows up in x. Thus x has complexity q" for all positive n. The factor of 

k 

length n < q'^ starting at the first position of the factor 0'^ consists of n O’s. 
The factor of length n starting at the second 0 of 0'^ also consists of n O’s. It 

k 

follows that if n < q'5 , then inrCx(n) must be less or equal to the length of the 
prefix of x ending just before the second 0 of the 0'^ substring. That length is 
(q'^*' ^ + q'^*' ^ -I- • • • -I- q^^) -I- (q^“^ -f q^“^ q^) -|- 1 for /c > 2. It follows 

that if q^*" ^ < n < q'^'°, then 

fc-i fc-i 

inrCx(n) < ^ ^ 9* + 1 

i=l i=l 


k-1 

i=l i=l 



< 2(2n-2)-hl 

< 4n. 


It is clear that if n < q”^ then inrCx(n) = 1, which completes the proof. □ 

The previous result showed that there can be a dramatic difference between 
the behaviours of the factor complexity function and the initial non-repetitive 
complexity function. Next we show what kind of separation is possible for these 
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two functions when we restrict our attention to pure morphic words. It is well- 
known that pure morphic words have O(n^) factor complexity [^. 

Define the morphism cj) by 0(0) = 001, 0(1) = 1 and let x = ^“^(O). It 
is known that x has Q{v?) factor complexity (see, for instance, [6|, Exam¬ 
ple 4.7.67]). 

Lemma 4. For all k > 0, the word x has the prefix zz, where \z\ = 2^+^ — 1. 


Proof. Since x begins with 00, it begins with 0^(O)0^(O) for all fc > 0. Thus we 
may take z = 0^(0). It remains to show that \z\ = 2^+^ — 1. 

2 0 


Let M = 


1 1 

induction shows that = 


be the adjacency matrix associated with 0. Then an easy 


2 '= - 1 1 


for k > 0. Now we have 


|z| = |0'=(O)| = |0'=(O)|o + |0'=(O)|i = 2'= + 2'= - 1 = 2'=+' - 1, 


as required. 


□ 


Proposition 5. inrCx(n) < 2n for n > 1. 

Proof. From Lemma lU we have that if n < — 1, then inrCx(n) < 2^+^ — 1. 

It follows that if 2*^ — 1 < n < 2*^+^ — 1, then inrCx(n) < 2*^+^ — 1 < 2(2^) — 1 < 
2n — 1 < 2n. □ 


4. Initial non-repetitive complexity of the Thue—Morse word 

We now begin to compute explicity the initial non-repetitive complexity 
functions for some of the classical infinite words, beginning with the Thue- 
Morse word. Recall that the Thue-Morse word is the word m = ^“(0), where 
fj.(0) = 01 and /r(l) = 10. 

Theorem 6. If 2^“^ < n <2^ for k > 1, then inrCm(?^) = 3(2^“^). 

The proof of the theorem will follow from the following lemmas. First note 
that it follows easily from the definition of /i that if A is a prefix of m of length 
2^ for A: > 0, then fJ,{A) = AA. 
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Proposition 7. If n < 2^ for k >1, then inrCm(n) < 3(2^ ^). 

Proof We show that m[0 ... 2^= - 1] = m[2'^ + ... 2'=+^ + - 1], Let 

A = m[0 ... 2'=-! - 1], Then = ^P{AA) = fi{AAAA) = AAAAAAAA is 
a prefix of m. Since |^| = 2^“^, then AA = m[0 ... 2^ — 1] and AA = m.[2^ + 
2^~^... 2^+^+2^“^ —1]. The upper bound for inrCm(''T') follows immediately. □ 

We make use of the next result to obtain a matching lower bound. 

Lemma 8. Example 10.10.3] For any integer k > 2, each factor of m of 
length 2^~^ + 1 occurs in the prefix of m of length 2^'^^. Furthermore, each one 
of these factors occurs exactly once in this prefix. 

Proposition 9. If2^~^ < n for k >2, then inrCm('ra) > 3(2^“^). 

Proof By LemmaHl the first 2'=+^ - {2^-^ + !) + ! = 3(2'=-i) factors of length 
2^-1 _|_ appearing in m are all distinct. Consequently the first 3(2^“^) length-n 
factors appearing in m must also be distinct. □ 

Using Propositions [7] and [9] and that inrcm(2) = 3 (obtained through obser¬ 
vation), we get Theorem inland thus the proof is complete. Though the theorem 
is not defined for n = 1, please note that inrCm(l) = 2. 

5. Initial non-repetitive complexity of the Fibonacci word 

Recall that the Fibonacci word is the word f = <^“(0), where (fi^) = 01 and 
<^(1) = 0. 

Theorem 10. If Fk-i < n + 1 < Fk for k >2, then inrcf(n) = Fk-i. 

We first need some preliminary results. Recall that fk = 4>^{0). 

Lemma 11. flJ . Chapter 2] For k >2, the words fk = fk-ifk -2 and fk- 2 fk-i 
differ only by their last two letters. 

Proposition 12. If n + 1 < Fk for k >2, then inrcf(n) < Fk-i. 
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Proof. We show that for any positive integer k >2, 

f[0...Ffe-3] = f[Ffe_i...Ffe+i-3]. 

We know fk+i = fkfk-i = fk-ifk- 2 fk-i is a prefix of /. By Lemma [TTl 
f [0 ... Ffc — 1] and i[Fk-i ... Fk+i — 1] agree up to but not including the last two 
positions. The result follows. □ 


Furthermore, the Fibonacci word is a standard Sturmian word, so for A: > 1, 
fk = Ukfs, where rs = 01 if fc is odd or rs = 10 if A: is even. The UkS are known 
as central words and it is known that these central words are palindromes and 
are bispecial (see Q, Chapter 2]). 

A semicentral word Q is a word in which the longest repeated prefix, longest 
repeated suffix, longest left special factor and longest right special factor are all 
the same word. Furthermore, this prefix/suffix/bispecial factor is a central word. 


Lemma 13 


• 0 , 


Proposition 16] The semicentral prefixes of a standard Stur¬ 


mian word are precisely the words of the form UkCSUk for k >1. 


The property described in the next lemma is the property of having grouped 
factors, which was mentioned in the introduction. 

Lemma 14. Corollary 1] A sequence is Sturmian if and only if, for n >0, 
it has a factor of length 2n containing all factors of length n exactly once. 
Furthermore, if n > 1, then there are exactly two such factors of length 2n, 
namely wOlu and wlOw, where w is the unique right special factor of length 
n — 1 and v is the unique left special factor of length n — 1. 

Proposition 15. If Fk-i < n + 1 for k >2, then inrcf (n) > Fk-i. 


Proof. It suffices to show that for n = Fk-i — 1, the first Fk-i factors of f of 
length n are all distinct. We know from Lemma [TS] that the Fibonacci word 
has the prefix Uk-irsuk-i where rs = 01 or rs = 10. Since these prefixes are 
of the same construction as the factors detailed in Lemma [14] (u/c_i is the left 
and right special factor of length n — 1), and since 


\uk-irsuk-i\ = 2\uk-ir\ = 2{Fk-i - 1) = 2n, 


10 


it follows that this semicentral prefix contains all factors of length n exactly 
once. Thus for all n > Fk-i — 1, all factors of length n are distinct over the first 
2{Fk-i — 1) positions and so the result follows. □ 

Using Propositions [T2] and [131 we get Theorem (TU] and thus the proof is 
complete. 


6. Initial non-repetitive complexity of the Tribonacci word 


Recall that the Tribonacci word is the word t = cr‘^(0), where tT(0) = 01, 
(t( 1) = 02, and a{2) = 0. 

Theorem 16. If < n < ^ for k > 1, then inrct(n) = T^. 

We first need to recall some known properties of the Tribonacci word. Recall 
that tk = CT^(O) and that Dk = tk-itk -2 ■ • ■ for k >1. 


Lemma 17. f2(A . Theorem 2.5] For k > 2, the longest common prefix of 
tk-3tk-itk-2 and tk is Dk- 2 - 


Lemma 18. 


2(\ . Proposition 2.9] For k > 1, \Dk\ = 


_ 'r’fc+i+Tfc_i —3 

2 


Lemma 19. For any positive integer k >2, 


Q Tk+i + Tk-i — 3 ^ 


= t 


Tk .. .Tk + 


Tk+i + Tk-i — 3 


- 1 


Proof. WeknOwtfc +2 = tk+ltktk-l = tktk-ltk- 2 tktk-l = tk-ltk-2tk-3tk-ltk-2tktk-l 
is a prefix of t for k >2. By Lemma flTl we know that tk- 3 tk-itk -2 agrees with 
tk up to the first \Dk- 2 \ symbols. It follows that tk-itk- 2 tk- 3 tk-itk -2 agrees 
with tk-itk- 2 tk up to the first \Dk\ symbols. Since tk-itk- 2 tk = ^[Tk •. - Tfe + 

Tk+i — 1], the result follows from Lemma [T51 □ 


We therefore have the following. 

Proposition 20. If n < for k >2, then inrct(n) < Tk. 

Before proving the lower bound for inrct(n), we need some additional prop¬ 
erties of the Tribonacci word. 


11 














Lemma 21. Proof of Proposition 3.3] The bispecial factors oft are pre¬ 
cisely the palindromic prefixes oft. Furthermore, the lengths of these (nonempty) 
prefixes are for k >0. 


Lemma 22. 


'2(1 . Lemma 2.3] If w is a palindrome, then a(w)0 is a palindrome. 


Lemma 23. If w is a palindromic prefix oft of length \Dk\ for k > 1, then 
a{w)0 is a palindromic prefix oft of length |Z3fc+i|. 


Proof. We know from Lemma [51] that all palindromic prefixes of t are of length 
\Dk\ for fc > 1. If w is a palindromic prefix of t of length \Dk\, then clearly 
^{w) is a prefix of t. Furthermore, since w starts with a 0 and is a palindrome, 
it ends with a 0. So a{w) ends with a 1. Since strings 11 and 12 are not in t, 
then a{w) must be followed by a 0. Thus, a'(w)0 is a prefix of t and we know 
from Lemma [22] that it is a palindrome. Applying the morphism a to w will at 
most double the length. Thus 


|o'(w)0| < 2|r(;| + 1 

Tk+i + Tfe_i ~ ^ ^ 

2Tfc_|_i + 2Tfc_i — 4 
2 

Tk+2 + Tfc+i + T/c + Tfc + Tfc_i + Tk-2 — 3 
2 

7fe+3 + Tfc+i — 3 
2 



— I^fe+2|- 


So the only option for the length of a(iu)0 is |Dfc+i|. □ 

Lemma 24. If w is a (nonempty) palindromic prefix oft, then the first symbols 
that follow each of the first two occurrences of w in t are different. 

Proof. By induction on k where |ry| = \Dk\. Since t = 0102 , the result 

holds for fc = 1. Assume that the first symbol that follows each of the first two 
occurences of w are different where |u>| = \Dk\. Since 21 and 22 are not factors 
of t, each of these occurrences of w are followed by different words among 0, 1, 
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and 20. Now, since ct(wO) = ct(w)01, a{wl) = ct(w) 02, and a{w20) = tT(w)001, 
we see that the first two occurences of a{w)0 are followed by different symbols. 
Since |cr(r(;)0| = |I?fe+i| by Lemmathis implies that the statement holds for 
fc + 1 and thus the statement holds for all k by induction. □ 

The following is a well-known property of t. 

Lemma 25. There is a unique left special factor and a unique right special 
factor of each length in t. 

Lemma 26. Let v denote the prefix of length oft for k > 2. All the 

factors of length |I3fe_i| that start between the beginning of the first occurrence 
of V and the beginning of the third occurrence of v are distinct (except for v). 

Proof. Firstly, since t is recurrent, we know that there are three occurrences 
of V in t. For the sake of contradiction, assume the factor v) of length 
\Dk-i\ has two occurences in t before we reach the first symbol of the third 
occurrence of v. For simplicity, let Vj denote the jth occurrence of v and Ui the 
ith occurrence of u. If the starting symbol of Ui is between the starting symbol 
of Vj and Vj+i, then we will denote that by Vj <Ui < Vj+i. 

Case 1: vi < ui < U 2 < V 2 . 

If Ml and U2 are preceded by different symbols, then m is a left special factor. 
This is a contradiction since u ^ v and v is the unique left special factor of 
length \Dk-i \ in t. Thus, assume they are preceded by the same symbol. Then 
we obtain another factor (formed by the first \Dk-i\ — I symbols of u and 
the symbol preceding mi), which we will call r, of length \Dk-i\ such that 
Ml < ri < r 2 < M 2 . Once again, if ri and r 2 are preceded by different symbols 
then we obtain a contradiction. By repeating this argument we eventually find 
that Ml < Vj < U 2 for some j, which contradicts our original assumption. 

Case 2: V 2 < ui < U 2 < M 3 . 

Similar to Case 1. 

Case 3: vi < ui < V 2 < U 2 < V 3 . 
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We apply the same argument as in Case 1. We either obtain the same 
contradiction described in that case, or we find that the factor starting with 
the first symbol of vi and ending with the last symbol of ui is identical to the 
factor starting with the first symbol of V2 and ending with the last symbol of 
U 2 - This is a contradiction since the symbols following vi and V 2 are different 

by Lemma mi 

In all three cases we obtain a contradiction. Thus all the factors of length 
\Dk-i\ (except v) are distinct. □ 

Lemma 27. Let k>2. All factors oft of length \Dk-i \ + 1 that begin prior to 
the third occurrence of the prefix oft of length are distinct. 

Proof It is a direct result of Lemmas [M] and [551 d 

Lemma 28. |^ . Section 6.3.5] If a square xx is a factor of t, then |a:| G 
{Tk,Tk +Tk-i} for some k>l. 

Lemma 29. If a word v of length |Dfe_i| for k > 5 overlaps itself in t, then 
the shortest period of v is at least Tk-2- 

Proof. The largest Tribonacci number or sum of consecutive Tribonacci numbers 
less than Tk-2 is Tfe-a + T'fe- 4 . Let v = xax be a factor of t of length 
where x is a nonempty factor of t and a is a possibly empty factor of t. Suppose 
that t contains the overlap xaxax. Note that |xo| is a period of v. Also, 
|xa| < \Dk-i\ and 2|xa| > |L>fc_i|. However, \Dk-i\ = Tk-2 + Tk-3 + Tk-4 + 

■ • ■ + To = 2 Tfc _3 + 2 Tfc _4 + 2Tk-5 + Tk-e ■ ■ ■ + Tq > 2{Tk-3 + Tk- 4 ) for k > 5. 
So every period of v must be larger than Tk-3 + Tk-4. Thus, from Lemma [551 
the shortest period of v is at least Tk- 2 . □ 

Lemma 30. If v is a prefix oft of length |Dfc_i| for k > 2, then the second 
occurence of v occurs at position Tk-i and the third occurs at position Tk. 

Proof. Since t = 01020100102010I020I0 • • •, it can be observed that the state¬ 
ment holds for k = 2,3,4. Thus, assume for the rest of this proof that k > 5. 
By Lemma [T9l we already know that the prefix v occurs at position Tk-i and 
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position Tfc. If there were an occurrence of v that started somewhere be¬ 
tween the beginning of t and position T^-i of t, then by Lemma [55] (note 
that |-Dfe_i| > 7fe_i), the start of this occurrence of v must be at distance 
at least Tk -2 from the beginning of t and at distance at least Tk -2 from po¬ 
sition Tk-i- This implies that 2Tk-2 < T^-i but to the contrary we have 
2Tfc_2 = Tk -2 + Tfe-a -I- Tk-i + Tk -5 = Tk-i + Tk -5 > Tk-i- Furthermore, 
Tk-i -\- 2Tk-2 > Tk-i -\-Tk- 2 +Tk -3 = Tk- It follows that no occurrence of v can 
start between the beginning of t and position Tk-i nor can it start anywhere 
between positions Tk-i and Tk- □ 

Proposition 31. If ^ ^ k >2, then inrct(n) > Tk- 

Proof. The result follows from Lemmas 123 and |30l □ 

Using Propositions [20l and [33 and that inrct(l) = 2 (making the theorem 
hold for fc = 1), we get Theorem [THl and thus the proof is complete. 

7. Initial non-repetitive complexity of squarefree words 

In this section we examine the possible behaviour of the initial non-repetitive 
complexity function for words avoiding squares or cubes. In particular, we 
attempt to construct words that avoid the desired type of repetition but have 
initial non-repetitive complexity as low as possible. 

Proposition 32. 

1. There is no infinite squarefree word x that has inrCx(n) < 2n for all n. 

2. There is no infinite cubefree word x that has inrCx(n.) < |n for all n. 

Proof 1. If inrCx(n) < 2n then inrCx(I) = 1 and therefore x = aa---, a 
contradiction. 

2. If inrCx(n-) < then inrCx(l) = I and inrCx(2) < 2. It follows that 
X = aaa ■ ■ ■ which is a contradiction. □ 
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Consider the infinite alphabet S = {0,1,2,...}. We define the sequence of 
Zimin words, Zq, Zi, Z 2 , ..., as follows: Zq = e and Zn+i = for n > 0. 

Let 

X = 0102010301020104-••, 
also known as the ruler sequence, be the limit of the 


Theorem 33. The infinite word x is squarefree and satisfies n < inrCx(n) < 2n 
for all n > 1. 


Proof. For the squarefreeness of x see 


12|. By the definition of x, if n < 2^ — 1, 


then inrCx(n) < 2 ^ for fc > 1. It follows that if 2*^ ^ < n < 2^, then inrCx(n) < 


2^ < 2(2^ ^) < 2n for all n. Also, since x is square-free, clearly n < inrCx(n) 


for all n. 


□ 


So we can obtain an infinite squarefree word over an infinite alphabet that 
has inrCx(n) < 2n for all n. Furthermore, for this word there are infinitely many 
values of n such that inrCx(n) = 2n. 

Using an infinite alphabet may seem like “cheating”, so next we examine 
what can be done over a finite alphabet. We will make use of a morphism 
9 : {0,1,2,...}* —>■ {a,b,c,d,e}*, which maps squarefree words over an infinite 
alphabet to squarefree words over an alphabet of size 5. First, let 


w = abcacbabcbacabc■■■ 


be the well-known squarefree word obtained by iterating the morphism 
a —>■ abc, b ^ ac, c ^ b. 

For f > 0, let IF} be the prefix of w of length i. We define 9{i) = dWiCWi for 
all i > 0. The map 9 is squarefree P, Corollary 1.4]; that is, if u is squarefree, 
then 9{u) is squarefree. 

Theorem 34. Let x be the ruler sequence defined previously. Then y = d(x) 
is a square-free word with inrCy(n) < 3n for all n except n = 2. 
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Proof. It is relatively easy to see that the prefix A of x of length 2^ — 1 will have 
2^-1 Qig^ 2^“^ I’s, and so on, down to only one occurrence of k — 1. Furthermore, 
as a result of how we defined the Wfs, we have |0(i)| = 2{i + 1). Thus, we have 

|0(A)| =2k + 2{2{k - 1)) + 4(2(fc - 2)) + ■ • ■ + 2'=-i(2(l)) 

i=0 

= 2'=+^ -2k-4. 

Furthermore, if B is the prefix of x of length 2^, then 

\0iB)\ = |6»(A)| + 2{k + 1) = 2'=+^ -2A:-4 + 2A: + 2 = 2'=+^ _ 2 , 

since \d(k)\ = 2{k + 1). As a result of the fact (see proof of Theorem 1551) that 
inrcx(n) < 2^ for 2'="! < n < 2^=, if 2^+^ - 2(fc - 1) - 4 < n < 2'=+^ -2k-4, 
then inrCy(n) < 2^+^ — 2. The expression 

2^+2 _ 2 

2fc+i - 2(A: - 1) - 3 

is a decreasing function of k and is less than 3 for fc > 4. Along with the fact 
that inrCy(n) < 3n for n < 22 (other than n = 2), which can be obtained 
through computation, we have inrCy(n) < 3n for all n except n = 2. □ 

It should be noted that 

2^+2_2 2^"I"2_2 

lim , , „ -= 1 and lim , , ^- = 2, 

fe^oo 2'=+2 - 2fc - 4 fc^oo 2^=+! - 2(A: - I) - 3 

meaning that 

^ inrCv(n) ^ inrCy(n) 

hmmf- - -= 1 and hmsup- - - < 2. 

PI ^00 Jl —>^00 ^ 

To obtain a result over a 3-letter alphabet we will need a morphism a (found 
by Brandenburg [3, Theorem 4]), which maps squarefree words on {a, 6, c, d, e} 
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to squarefree words on {a, &, c}. We define it by 


a(a) = abacabcacbabcbacbc 
a{b) = abacabcacbacabacbc 
a(c) = abacabcacbcabcbabc 
a(d) = abacabcbacabacbabc 
a(e) = abacabcbacbcacbabc. 

Theorem 35. The word z = cr(y) is square-free and has inrCz(n) < 3n for all 
n > 36. 

Proof. Since |cr(m)| = 18 for m G {a, b, c, d, e} and inrCy(n) < 2^+^ — 2 if 
2^=+^ - 2(fc - 1) - 4 < n < 2'=+^ - 2fc - 4, 
then it follows that if 

18 ( 2 fc+i _ 2(fc - 1) - 4) < n < 18(2'=+^ -2k- 4), 

then inrCz(n) < 18(2^+^ — 2). The expression 

18(2fc+2 _2) 

18(2fe+i _2(A:- 1) -4) + 1 

is a decreasing function of k and is less than 3 for fc > 4. Along with the fact 
that inrCz(n) < 3n for 36 < n < (18)(22) = 396, which can be obtained through 
computation, inrCz(n) < 3n for all n > 36. □ 

Furthermore, similar to the word y, we have 

18(2fe+2_2) 18(2'=+2-2) 

lim — T-TTT, -T = 1 and lim — . , ,, -;-^^- = 2 

fc^oo 18(2'=+2 - 2fc - 4) fc^oo 18(2'=+! - 2(fc - 1) - 4) + 1 

meaning that 

li„. i„f = 1 and lim sup < 2. 

n >^00 77 - n—^co ^ 

Also note that since the Thue-Morse word is overlap-free. Theorem!^ shows 
that it is an example of an overlap-free word with inrCm(''^) < 3n for all n > 1. 
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8. Open questions 


Question 1. Is the constant 1/(1 + (^^) in Theorem [T] best possible? Can it be 
replaced by 1? 

Question 2. For each positive integer d, it is possible to construct an infi¬ 
nite word X whose initial non-repetitive complexity is 0(n‘^)? What are the 
possibilities for the usual factor complexity of such a word? 

Question 3. Is the word x of Theorem 1551 the only (up to permutation of the 
infinite alphabet) infinite squarefree word such that inrCx(n) < 2n for all n? 

Question 4. Are the examples given in Section 0 optimal for squarefree words: 
i.e., are there squarefree words whose initial non-repetitive complexity functions 
grow even slower than the examples given here? 

Question 5. Can results similar to those proved here also be proved for the 
function nrCx(n) defined in the Introduction? A detailed study of this function 
would be quite interesting. 
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